class: center, middle, inverse, title-slide # Covid-19, Global Pandemic, and Data Science ### Team Chrissy & Ricky
Chrissy Aman and Ricky Sun ### Bates College ### 2022-04-11 --- ## Outline <style type="text/css"> .remark-slide-content { font-size: 30px; padding: 1em 4em 1em 4em; } </style> - Introduction - Literature Review - Our Data - Methods & Data Analyses - Results - Limitations and Potential Future Studies --- class: inverse, center, middle background-image: url("images/cool.png") # Introduction background-image: url(https://images.unsplash.com/photo-1535448033526-c0e85c9e6968?ixlib=rb-1.2.1&ixid=eyJhcHBfaWQiOjEyMDd9&auto=format&fit=crop&w=1650&q=80) --- # Introduction COVID-19, also known as Coronavirus disease 2019 is a contagious disease caused by a virus, the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) Up until yesterday, there are over xxx infections and xxx deaths since the beginning of the pandemic Although no one can predict a pandemic like this, with the help of data, we might be able to use available data to, for example, evaluate risks, so that the virus can be better managed or even contained in earlier stages. --- ## Research Question - In our research, we are trying to use Covid-19 related data, together with other relevant data to find potential predictors for Covid-19 cases, deaths, or vaccination rates. ? Do vaccinations effectively mitigate the death rate ? Can we implement machine learning algorithm to predict Covid-19 ? Are higher percentage of older people predicts higher death rate ? What about other variables in predicting Covid-19 --- class: inverse, middle, center # Literature Review --- class: my-one-page-font # [1] Covid-19 severity - .small[The unpredictability of the progression of coronavirus disease 2019 (COVID-19) may be attributed to the low precision of the tools used to predict the prognosis of this disease, especially when the virus is mutating in a fast speed from alpha, to Omicron, and there are more recent variants too.] - With the help of models proposed (using clinical data) by those paper, we may be better at prognosis of Covid-19 cases severity. This can also be used in decision-making related to the care of patients infected with COVID-19. .my-one-page-font { font-size: 20px; } ] .footnote[ vv ] --- # [2] Covid-19 and weather - Akin to respiratory tract infection diseases, climatic conditions may significantly influence the COVID-19 pandemic Since the beginning of the COVID-19 pandemic, significant efforts have been made to explore the relationship between climatic condition and growth in number of COVID-19 cases. - From those studies, it seems that there are significant Interactions of temperature and relative humidity for growth of COVID-19 cases and death rates. Air pollution may be another factor that predict covid-19 deaths. Those evidence are collected from various of countries. --- # [3] Covid-19 and social media - Social media data (such as twits or social media indexes) from, for example, google search, twitter, facebook and other social media platform, may also be used to develop models and as early warning signals of COVID-19 outbreaks. Social media data can also presents with people's perception of risks and general mental states of a region. --- # [4] COVID-19 and impacts - Covid-19 also has had great impacts in our daily lives (racial issues, job markets, also economic activities, and so on) ### It is found that an increase in vaccination per capita is associated with a significant increase in economic activity. ### It is also found evidence for nonlinear effects of vaccines: marginal economic benefits when vaccination rates are higher. ### Country-specific conditions play an important role, with lower economic gains if strict containment measures are in place or if the country is experiencing a severe outbreak. --- # [5] Covid-19 and machine learning - Developing accurate forecasting tools will help in our fight against the pandemic. Prediction models that combine several features to estimate the risk of infection have been developed. - These aim to assist medical staff worldwide in triaging patients, especially in the context of limited healthcare resources. --- # [6] vitamin D and covid-19 https://www.frontiersin.org/articles/10.3389/fpubh.2021.736665/full Several studies suggest an association between serum 25-hydroxyvitamin D (25OHD) and the likelihood of suffering severe symptoms of covid-19. The data supports a significant effect of vitamin D in preventing and mitigating respiratory tract infections have emerged. Severe deficiency, deficiency and insufficiency of vitamin D were all associated with ICU admission In this study of almost 1.5 million individuals. --- # Literature Review - [6?] --- class: inverse, middle, center # Our Data --- ## Our Data - details Our dataset is coming from "Our World in Data" Covid-19 public data, together with data from JHU, WHO, CDC and World Bank. The data covers a wide range: - Basic Covid-19 data (cases, deaths) - Hospital & ICU (ICU beds, ICU patients) - Policy responses (stringency_index) - Reproduction rate - Tests & positivity - Vaccinations - Others (populations, life_expectancy, GDP per catpita and so on) --- class: inverse, middle, center # Methods & Data Analyses --- # Methods & Data Analyses - details We have three major parts of analyses: 1. preliminary exploration like summary statistics, scatter plots, correlations, maps 2. Regression analyses, ranging from OLS, Diff in Diff, regression continuity 3. Advanced models and machine learning algorithnms --- class: inverse, middle, center # Results & Implications --- # [1a] summary statistics --- # [1a] summary statistics Development (HDI) .pull-left[ - Some text - goes here ] .pull-right[ ``` ## Selecting by human_development_index ``` ``` ## # A tibble: 10 × 2 ## location human_development_index ## <chr> <dbl> ## 1 Norway 0.957 ## 2 Ireland 0.955 ## 3 Switzerland 0.955 ## 4 Hong Kong 0.949 ## 5 Iceland 0.949 ## 6 Germany 0.947 ## 7 Sweden 0.945 ## 8 Australia 0.944 ## 9 Netherlands 0.944 ## 10 Denmark 0.94 ``` ] --- ``` ## Reading layer `TM_WORLD_BORDERS-0.3' from data source ## `/cloud/project/data/world_shape_file/TM_WORLD_BORDERS-0.3.shp' using driver `ESRI Shapefile' ## Simple feature collection with 246 features and 11 fields ## Geometry type: MULTIPOLYGON ## Dimension: XY ## Bounding box: xmin: -180 ymin: -90 xmax: 180 ymax: 83.6236 ## Geodetic CRS: WGS 84 ```
--- # Results & Implications [1b: preliminary analyses - correlation and heat map] <img src="presentation_files/figure-html/correlation_heatmap-1.png" width="80%" /> --- # Results & Implications [1c: preliminary analyses - maps] Include dynamic maps for vaccination Alt (alternative) text 1. chart type 2. of type data (x and y, color) 3. reason for including chart ```r worldshapefile <- "data/world_shape_file/TM_WORLD_BORDERS-0.3.shp" shape <- st_read(dsn = "../data/world_shape_file/TM_WORLD_BORDERS-0.3.shp") ``` ``` ## Reading layer `TM_WORLD_BORDERS-0.3' from data source ## `/cloud/project/data/world_shape_file/TM_WORLD_BORDERS-0.3.shp' using driver `ESRI Shapefile' ## Simple feature collection with 246 features and 11 fields ## Geometry type: MULTIPOLYGON ## Dimension: XY ## Bounding box: xmin: -180 ymin: -90 xmax: 180 ymax: 83.6236 ## Geodetic CRS: WGS 84 ``` --- # Results & Implications [1d: preliminary analyses - scatter plots and other ggplots] ```r covid_data %>% filter(date == "2022-02-20") %>% filter(is.na(continent) == FALSE) %>% mutate(smoker = female_smokers + male_smokers) %>% mutate(vaccination_rate = people_fully_vaccinated/population) %>% mutate(booster = total_boosters/population) %>% ggplot(mapping = aes(x = vaccination_rate, y = human_development_index)) + geom_point(size = 2, mapping = aes()) + labs(title = "vaccination rate vs. human development index", subtitle = "xx", x = "vaccination rate", y = "human development index") + scale_color_viridis_d() + geom_smooth(color = "blue") ``` ``` ## `geom_smooth()` using method = 'loess' and formula 'y ~ x' ``` ``` ## Warning: Removed 131 rows containing non-finite values (stat_smooth). ``` ``` ## Warning: Removed 131 rows containing missing values (geom_point). ``` <img src="presentation_files/figure-html/vaccination-HDI-1.png" title="scatterplot of flipper length by bill length of 3 penguin species, where we show penguins with bigger flippers have bigger bills" alt="scatterplot of flipper length by bill length of 3 penguin species, where we show penguins with bigger flippers have bigger bills" width="80%" /> --- # Results & Implications [1d: preliminary analyses - scatter plots and other ggplots] ```r covid_data %>% filter(date == "2022-02-20") %>% filter(is.na(continent) == FALSE) %>% mutate(smoker = female_smokers + male_smokers) %>% mutate(vaccination_rate = people_fully_vaccinated/population) %>% mutate(death = total_deaths/population) %>% mutate(booster = total_boosters/population) %>% ggplot(mapping = aes(x = vaccination_rate, y = death)) + geom_point(size = 1, mapping = aes()) + labs(title = "percentage death of population vs. diabetes prevalence", subtitle = "xx", x = "percentage death of population", y = "diabetes prevalence") + scale_color_viridis_d() + geom_smooth(color = "blue") ``` ``` ## `geom_smooth()` using method = 'loess' and formula 'y ~ x' ``` ``` ## Warning: Removed 128 rows containing non-finite values (stat_smooth). ``` ``` ## Warning: Removed 128 rows containing missing values (geom_point). ``` <img src="presentation_files/figure-html/cases-diabetes-1.png" width="80%" /> --- # Results & Implications [1e: time series analyses] --- ```r # Libraries library(dygraphs) library(xts) # To make the convertion data-frame / xts format # Format 3: Several variables for each date covid_time <- covid_data %>% select(location, date, total_cases) %>% filter(location == c("China", "Japan", "India", "Brazil", "France", "Germany", "Italy", "Mexico", "Russia", "United Kindom", "United States")) ``` ``` ## Warning in location == c("China", "Japan", "India", "Brazil", "France", : longer object length ## is not a multiple of shorter object length ``` ```r covid_time_wide <- pivot_wider(data = covid_time, names_from = location, values_from = total_cases) ``` ```r # data <- covid_time_wide( # time=seq(date), # value1=total_cases.China, # value2=total_cases.United_States # ) # # don don = xts(x = covid_time_wide[,-1], order.by = covid_time_wide$date) # Chart p <- dygraph(don) p ```
--- # Results & Implications [1e: time series analyses] --- # Results & Implications - [2a: regression analyses - simple linear regression] ``` ## # A tibble: 5 × 5 ## term estimate std.error statistic p.value ## <chr> <dbl> <dbl> <dbl> <dbl> ## 1 (Intercept) -1136. 1829. -0.621 0.539 ## 2 human_development_index 5579. 2704. 2.06 0.0473 ## 3 vaccination -3302. 1928. -1.71 0.0964 ## 4 booster 638. 1827. 0.349 0.729 ## 5 stringency_index -2.22 10.8 -0.204 0.839 ``` --- # Results & Implications - [2a: regression analyses - simple linear regression] --- # Results & Implications - [2b: regression analyses - other] 1. Fixed effect 2. Difference in Difference 3. Regression Discontinuity # Results & Implications - [3: machine learnng] 1. linear regression 2. logistic regression 3. KNN (clusters) https://github.com/allisonhorst/stats-illustrations/ --- class: inverse, middle, center # Limitations and Potential Future Studies --- # Limitations - details --- # Future Studies Future studies --- # References [1] Want to find out more about `xaringan`? See https://slides.yihui.name/xaringan/#1. [2] You are welcomed to use the default styling of the slides. In fact, that's what I expect majority of you will do. You will differentiate yourself with the content of your presentation. [3] [4] ---